AITopics | data classification

Collaborating Authors

data classification

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Network-Based Detection of Autism Spectrum Disorder Using Sustainable and Non-invasive Salivary Biomarkers

Fernandes, Janayna M., Sabino-Silva, Robinson, Carneiro, Murillo G.

arXiv.org Artificial IntelligenceSep-22-2025

Autism Spectrum Disorder (ASD) lacks reliable biological markers, delaying early diagnosis. Using 159 salivary samples analyzed by ATR-FTIR spectroscopy, we developed GANet, a genetic algorithm-based network optimization framework leveraging PageRank and Degree for importance-based feature characterization. GANet systematically optimizes network structure to extract meaningful patterns from high-dimensional spectral data. It achieved superior performance compared to linear discriminant analysis, support vector machines, and deep learning models, reaching 0.78 accuracy, 0.61 sensitivity, 0.90 specificity, and a 0.74 harmonic mean. These results demonstrate GANet's potential as a robust, bio-inspired, non-invasive tool for precise ASD detection and broader spectral-based health applications.

classification, evolutionary algorithm, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2509.16126

Genre: Research Report > New Finding (0.88)

Industry: Health & Medicine > Therapeutic Area > Neurology > Autism (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(2 more...)

Add feedback

ATwo-Stage Ensemble Feature Selection and Particle Swarm Optimization Approach for Micro-Array Data Classification in Distributed Computing Environments

Adhikari, Aayush, Bhatta, Sandesh, Jangwan, Harendra S., Mishra, Amit, Nisa, Khair Ul, Zamani, Abu Taha, Sapkota, Aaron, Muduli, Debendra, Parveen, Nikhat

arXiv.org Artificial IntelligenceJul-8-2025

High dimensionality in datasets produced by microarray technology presents a challenge for Machine Learning (ML) algorithms, particularly in terms of dimensionality reduction and handling imbalanced sample sizes. To mitigate the explained problems, we have proposedhybrid ensemble feature selection techniques with majority voting classifier for micro array classi f ication. Here we have considered both filter and wrapper-based feature selection techniques including Mutual Information (MI), Chi-Square, Variance Threshold (VT), Least Absolute Shrinkage and Selection Operator (LASSO), Analysis of Variance (ANOVA), and Recursive Feature Elimination (RFE), followed by Particle Swarm Optimization (PSO) for selecting the optimal features. This Artificial Intelligence (AI) approach leverages a Majority Voting Classifier that combines multiple machine learning models, such as Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost), to enhance overall performance and accuracy. By leveraging the strengths of each model, the ensemble approach aims to provide more reliable and effective diagnostic predictions. The efficacy of the proposed model has been tested in both local and cloud environments. In the cloud environment, three virtual machines virtual Central Processing Unit (vCPU) with size 8,16 and 64 bits, have been used to demonstrate the model performance. From the experiment it has been observed that, virtual Central Processing Unit (vCPU)-64 bits provides better classification accuracies of 95.89%, 97.50%, 99.13%, 99.58%, 99.11%, and 94.60% with six microarray datasets, Mixed Lineage Leukemia (MLL), Leukemia, Small Round Blue Cell Tumors (SRBCT), Lymphoma, Ovarian, andLung,respectively, validating the effectiveness of the proposed modelin bothlocalandcloud environments.

artificial intelligence, evolutionary algorithm, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2507.04251

Country:

Asia > India (0.46)
Asia > Middle East (0.28)

Genre:

Research Report > New Finding (0.89)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

A Metric Topology of Deep Learning for Data Classification

Wu, Jwo-Yuh, Huang, Liang-Chi, Li, Wen-Hsuan, Liu, Chun-Hung

arXiv.org Machine LearningJan-19-2025

Empirically, Deep Learning (DL) has demonstrated unprecedented success in practical applications. However, DL remains by and large a mysterious "black-box", spurring recent theoretical research to build its mathematical foundations. In this paper, we investigate DL for data classification through the prism of metric topology. Considering that conventional Euclidean metric over the network parameter space typically fails to discriminate DL networks according to their classification outcomes, we propose from a probabilistic point of view a meaningful distance measure, whereby DL networks yielding similar classification performances are close. The proposed distance measure defines such an equivalent relation among network parameter vectors that networks performing equally well belong to the same equivalent class. Interestingly, our proposed distance measure can provably serve as a metric on the quotient set modulo the equivalent relation. Then, under quite mild conditions it is shown that, apart from a vanishingly small subset of networks likely to predict non-unique labels, our proposed metric space is compact, and coincides with the well-known quotient topological space. Our study contributes to fundamental understanding of DL, and opens up new ways of studying DL using fruitful metric space theory.

artificial intelligence, machine learning, topology, (13 more...)

arXiv.org Machine Learning

2501.11265

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.55)

Add feedback

A Bilevel Optimization Framework for Imbalanced Data Classification

Medlin, Karen, Leyffer, Sven, Raghavan, Krishnan

arXiv.org Machine LearningDec-16-2024

Data rebalancing techniques, including oversampling and undersampling, are a common approach to addressing the challenges of imbalanced data. To tackle unresolved problems related to both oversampling and undersampling, we propose a new undersampling approach that: (i) avoids the pitfalls of noise and overlap caused by synthetic data and (ii) avoids the pitfall of under-fitting caused by random undersampling. Instead of undersampling majority data randomly, our method undersamples datapoints based on their ability to improve model loss. Using improved model loss as a proxy measurement for classification performance, our technique assesses a datapoint's impact on loss and rejects those unable to improve it. In so doing, our approach rejects majority datapoints redundant to datapoints already accepted and, thereby, finds an optimal subset of majority training data for classification. The accept/reject component of our algorithm is motivated by a bilevel optimization problem uniquely formulated to identify the optimal training set we seek. Experimental results show our proposed technique with F1 scores up to 10% higher than state-of-the-art methods.

artificial intelligence, datapoint, machine learning, (17 more...)

arXiv.org Machine Learning

2410.11171

Country:

North America > United States > North Carolina (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > South Korea > Gyeongsangnam-do > Changwon (0.04)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.48)

Industry:

Law Enforcement & Public Safety > Fraud (0.47)
Government > Regional Government (0.47)
Health & Medicine (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.72)

Add feedback

Gradient Boosting Decision Trees on Medical Diagnosis over Tabular Data

Yıldız, A. Yarkın, Kalayci, Asli

arXiv.org Artificial IntelligenceOct-11-2024

Medical diagnosis is a crucial task in the medical field, in terms of providing accurate classification and respective treatments. Having near-precise decisions based on correct diagnosis can affect a patient's life itself, and may extremely result in a catastrophe if not classified correctly. Several traditional machine learning (ML), such as support vector machines (SVMs) and logistic regression, and state-of-the-art tabular deep learning (DL) methods, including TabNet and TabTransformer, have been proposed and used over tabular medical datasets. Additionally, due to the superior performances, lower computational costs, and easier optimization over different tasks, ensemble methods have been used in the field more recently. They offer a powerful alternative in terms of providing successful medical decision-making processes in several diagnosis tasks. In this study, we investigated the benefits of ensemble methods, especially the Gradient Boosting Decision Tree (GBDT) algorithms in medical classification tasks over tabular data, focusing on XGBoost, CatBoost, and LightGBM. The experiments demonstrate that GBDT methods outperform traditional ML and deep neural network architectures and have the highest average rank over several benchmark tabular medical diagnosis datasets. Furthermore, they require much less computational power compared to DL models, creating the optimal methodology in terms of high performance and lower complexity.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2410.03705

Country:

North America > United States > Massachusetts > Worcester County > Worcester (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Asia > India (0.04)
Asia > China (0.04)

Genre:

Research Report > New Finding (0.87)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.71)
Health & Medicine > Therapeutic Area > Oncology (0.68)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(2 more...)

Add feedback

Improving Fuzzy Rule Classifier with Brain Storm Optimization and Rule Modification

Huang, Yan, Liu, Wei, Zang, Xiaogang

arXiv.org Artificial IntelligenceOct-2-2024

The expanding complexity and dimensionality in the search space can adversely affect inductive learning in fuzzy rule classifiers, thus impacting the scalability and accuracy of fuzzy systems. This research specifically addresses the challenge of diabetic classification by employing the Brain Storm Optimization (BSO) algorithm to propose a novel fuzzy system that redefines rule generation for this context. An exponential model is integrated into the standard BSO algorithm to enhance rule derivation, tailored specifically for diabetes-related data. The innovative fuzzy system is then applied to classification tasks involving diabetic datasets, demonstrating a substantial improvement in classification accuracy, as evidenced by our experiments.

algorithm, classification, membership function, (15 more...)

arXiv.org Artificial Intelligence

2410.01413

Country:

Asia > China > Hubei Province > Wuhan (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.90)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Graph-Based Bidirectional Transformer Decision Threshold Adjustment Algorithm for Class-Imbalanced Molecular Data

Hayes, Nicole, Merkurjev, Ekaterina, Wei, Guo-Wei

arXiv.org Artificial IntelligenceJun-19-2024

Data sets with imbalanced class sizes, often where one class size is much smaller than that of others, occur extremely often in various applications, including those with biological foundations, such as drug discovery and disease diagnosis. Thus, it is extremely important to be able to identify data elements of classes of various sizes, as a failure to detect can result in heavy costs. However, many data classification algorithms do not perform well on imbalanced data sets as they often fail to detect elements belonging to underrepresented classes. In this paper, we propose the BTDT-MBO algorithm, incorporating Merriman-Bence-Osher (MBO) techniques and a bidirectional transformer, as well as distance correlation and decision threshold adjustments, for data classification problems on highly imbalanced molecular data sets, where the sizes of the classes vary greatly. The proposed method not only integrates adjustments in the classification threshold for the MBO algorithm in order to help deal with the class imbalance, but also uses a bidirectional transformer model based on an attention mechanism for self-supervised learning. Additionally, the method implements distance correlation as a weight function for the similarity graph-based framework on which the adjusted MBO algorithm operates. The proposed model is validated using six molecular data sets, and we also provide a thorough comparison to other competing algorithms. The computational experiments show that the proposed method performs better than competing techniques even when the class imbalance ratio is very high.

algorithm, roc-auc score, threshold, (15 more...)

arXiv.org Artificial Intelligence

2406.06479

Country: North America > United States > Michigan (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Towards Better Serialization of Tabular Data for Few-shot Classification with Large Language Models

Jaitly, Sukriti, Shah, Tanay, Shugani, Ashish, Grewal, Razik Singh

arXiv.org Artificial IntelligenceDec-20-2023

We present a study on the integration of Large Language Models (LLMs) in tabular data classification, emphasizing an efficient framework. Building upon existing work done in TabLLM (arXiv:2210.10723), we introduce three novel serialization techniques, including the standout LaTeX serialization method. This method significantly boosts the performance of LLMs in processing domain-specific datasets, Our method stands out for its memory efficiency and ability to fully utilize complex data structures. Through extensive experimentation, including various serialization approaches like feature combination and importance, we demonstrate our work's superiority in accuracy and efficiency over traditional models.

classification, dataset, tabular data, (15 more...)

arXiv.org Artificial Intelligence

2312.12464

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

An alternative to SVM Method for Data Classification

Remaki, Lakhdar

arXiv.org Artificial IntelligenceAug-20-2023

Support vector machine (SVM), is a popular kernel method for data classification that demonstrated its efficiency for a large range of practical applications. The method suffers, however, from some weaknesses including; time processing, risk of failure of the optimization process for high dimension cases, generalization to multi-classes, unbalanced classes, and dynamic classification. In this paper an alternative method is proposed having a similar performance, with a sensitive improvement of the aforementioned shortcomings. The new method is based on a minimum distance to optimal subspaces containing the mapped original classes.

artificial intelligence, machine learning, subspace, (14 more...)

arXiv.org Artificial Intelligence

2308.11579

Country:

North America > United States (0.05)
Asia > Taiwan (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.57)

Add feedback

Multiclass classification for multidimensional functional data through deep neural networks

Wang, Shuoyang, Cao, Guanqun

arXiv.org Artificial IntelligenceMay-23-2023

The intrinsically infinite-dimensional features of the functional observations over multidimensional domains render the standard classification methods effectively inapplicable. To address this problem, we introduce a novel multiclass functional deep neural network (mfDNN) classifier as an innovative data mining and classification tool. Specifically, we consider sparse deep neural network architecture with rectifier linear unit (ReLU) activation function and minimize the cross-entropy loss in the multiclass classification setup. This neural network architecture allows us to employ modern computational tools in the implementation. The convergence rates of the misclassification risk functions are also derived for both fully observed and discretely observed multidimensional functional data. We demonstrate the performance of mfDNN on simulated data and several benchmark datasets from different application domains.

artificial intelligence, classification, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2305.13349

Country: North America > United States > New York (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)

Add feedback